library(ggplot2)
library(forcats)
library(reshape2)
# Set labels to be smaller - this helps readability
par(cex.axis=0.8)
# Load data
df<- read.csv("https://raw.githubusercontent.com/UC-MACSS/persp-analysis/master/assignments/exploratory-data-analysis/data/gss2012.csv")
# Remove year and id as they are not meaningful
df <- df[,-c(1,2)]
This section seeks to answer the following questions:
Are there outliers in the data?
Do I have missingness? Are there patterns to it?
missing_data <- sort(colSums(is.na(df)), decreasing = TRUE)
barplot(missing_data, horiz = TRUE)
plot(missing_data)
hist(missing_data)
missing <- head(missing_data, n = 20)
dotchart(t(missing))
barplot(missing, horiz = TRUE)
hist(df$age)
plot(df$race)
plot(df$sex)
polview <- ordered(df$polviews, levels = c("ExtrmCons", "Conserv", "SlghtCons", "Moderate", "SlghtLib", "Liberal", "ExtrmLib"))
plot(polview)
plot(df$abany ~ df$abdefect)
plot(df$abany ~ df$abdefect + df$abhlth + df$abnomore + df$abpoor + df$abrape + df$absingle)
wrkwayup <- ordered(df$wrkwayup, levels = c("AGREE STRONGLY", "AGREE SOMEWHAT", "NEITHER AGREE NOR DISAGREE", "DISAGREE SOMEWHAT", "DISAGREE STRONGLY"))
plot(df$spkrac ~ wrkwayup)
plot(df$closeblk ~ df$spkrac + wrkwayup)
hist(df$closeblk)
plot(df$spkrac)
plot(wrkwayup)
Interesting that “closeness” to black individuals seems to have no meaningful association with other variables.
The plot closeblk vs wrkwayup is interesting for its final category, vigorous disapproval which has a wider distribution and also contains only values indicating strong “closeness” to black individuals. This suggests the question is flawed and maybe draws out highly charged, or non-representative answers. This is supported by the unusually similar distributions across the other categories.
tdf <- data.frame(df$closeblk, df$closewht, df$race)
colnames(tdf) <- c("closeblk", "closewht", "race")
ggplot(tdf, aes(closeblk, closewht)) +
geom_point() +
geom_jitter()
## Warning: Removed 680 rows containing missing values (geom_point).
## Warning: Removed 680 rows containing missing values (geom_point).
ggplot(tdf, aes(closeblk, closewht, colour = race)) +
geom_point() +
geom_jitter()
## Warning: Removed 680 rows containing missing values (geom_point).
## Warning: Removed 680 rows containing missing values (geom_point).
* Dotplot with jitter confirms the above suppositions. Many people choose round values (5, 8), and there is a clear tendency to select the same value for both questions.
* The question is therefore flawed in some way as much of these answers are likely not representative of respondents true internal state.
* It is worth noting, however, that closeness to white is clearly more common than closeness to black.
* This can probably be considered due to the large white bias in the dataset.
* Notice, however, that it does seem to be a one or the other scenario (close to black, or close to white). Other is too small to be accurately represented.
tdf <- na.omit(data.frame(cbind(df$closeblk, df$black_traits, df$white_traits, df$closewht, df$race)))
colnames(tdf) <- c("closeblk", "black_traits", "white_traits", "closewht", "race")
ggplot(tdf, aes(closeblk, black_traits, colour = race)) +
geom_point() +
geom_jitter()
ggplot(tdf, aes(closewht, white_traits, colour = race)) +
geom_point() +
geom_jitter()
* The codebook does not clarify what exactly the black/white_traits variable means. I will assume based on the wording of the entry that it means if respondents are able to identify stereotypes of a particular group, as the brief description suggestions.
* If that is so, then the plot would suggest individuals in general struggle to describe their closeness to a group, as assessed by their ability to identify stereotypes (a proxy for knowledge about the life/social condition of that group)
ggplot(tdf, aes(closewht, white_traits)) +
geom_count()
This section looks into the following questions:
How much variation/error exists in my statistical estimates? Is there a pattern to it?
What type of variation occurs within my variables?
What type of covariation occurs between my variables?
plot(df[,1:10])
ggplot(df, aes(df$marital, df$wrkstat, colour = df$sex)) +
geom_point() +
geom_jitter()
plot(df$wrkslf ~ df$age)
ggplot(df, aes(x = wrkgvt, y = polviews)) +
geom_bin2d()
plot(df$getahead ~ df$race)
fechld <- ordered(df$fechld, c("STRONGLY AGREE", "AGREE", "DISAGREE", "STRONGLY DISAGREE"))
plot(df$kids ~ fechld)
conclerg <- as.character(df$conclerg)
conclerg <- ordered(conclerg, levels = c("A GREAT DEAL","ONLY SOME", "HARDLY ANY"))
confed <- as.character(df$confed)
confed <- ordered(confed, levels = c("A GREAT DEAL","ONLY SOME", "HARDLY ANY"))
tdf <- na.omit(data.frame(cbind(conclerg, confed)))
colnames(tdf) <- c("conclerg", "confed")
ggplot(tdf) +
geom_violin(aes(x = conclerg, y = confed))
plot(conclerg ~ confed)
conclerg <- as.character(df$conclerg)
conclerg <- ordered(conclerg, levels = c("A GREAT DEAL","ONLY SOME", "HARDLY ANY"))
obey <- as.character(df$obey)
obey <- ordered(obey, levels = c("MOST IMPORTANT", "2ND IMPORTANT", "3RD IMPORTANT", "4TH IMPORTANT", "LEAST IMPORTANT"))
plot(conclerg ~ obey)
ggplot(data.frame(conclerg, obey), aes(x = conclerg, y = obey)) +
geom_point()+
geom_jitter()
plot(df$militarist_tol ~ df$con_govt)
deg <- ordered(df$degree, levels = c("<HS", "HS", "Junior Coll", "Bachelor deg", "Graduate deg") )
plot(df$con_govt ~ deg)
deg <- ordered(df$degree, levels = c("<HS", "HS", "Junior Coll", "Bachelor deg", "Graduate deg") )
plot( deg ~ df$con_govt + df$conarmy + df$conclerg + df$conbus + df$coneduc + df$contv + df$consci)
plot(df$gunlaw ~ df$con_govt)
plot(df$gunlaw ~ df$conlegis)
polview <- ordered(df$polviews, levels = c("ExtrmCons", "Conserv", "SlghtCons", "Moderate", "SlghtLib", "Liberal", "ExtrmLib"))
hap <- ordered(df$happy, levels = c("NOT TOO HAPPY", "PRETTY HAPPY", "VERY HAPPY"))
plot(hap ~ polview)
Social Issue Associations
Abortion
Attitudes towards unconditional abortion access compared to perceived amount spending on law enforcement.
Slight increase in the “too little” category.
May reflect feelings towards authority, morality, etc.
This supports the above hunch - that attitudes towards abortion are also tied to beliefs about authority and tolerance.
Note the people who would not allow an atheist to speak in their town are also more likely to oppose unconditional abortion access.
Trying the same process, but for a less jingoistic item. An individual claiming black people are genetically inferior:
Support for Abortion and Religion
Let’s try with people who feel they were reborn, this should capture a broader scope of Christians than just denominations. It also gets at the more robust believers, which, per my theory, should be strongly, negatively, associated with support for abortion.
Abortion and Attitudes on Sexual Practices
Economic Equality and Attitudes on Sexual Practices
Economic Inequality Associations